<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="http://egonw.github.io/blog/feed/by_tag/scholia.xml" rel="self" type="application/atom+xml" /><link href="http://egonw.github.io/blog/" rel="alternate" type="text/html" /><updated>2026-06-21T11:07:45+00:00</updated><id>http://egonw.github.io/blog/feed/by_tag/scholia.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">CIP rules for stereochemistry</title><link href="http://egonw.github.io/blog/2010/04/22/cip-rules-for-stereochemistry.html" rel="alternate" type="text/html" title="CIP rules for stereochemistry" /><published>2010-04-22T00:00:00+00:00</published><updated>2010-04-22T00:00:00+00:00</updated><id>http://egonw.github.io/blog/2010/04/22/cip-rules-for-stereochemistry</id><content type="html" xml:base="http://egonw.github.io/blog/2010/04/22/cip-rules-for-stereochemistry.html"><![CDATA[<p>Uniquely identifying <a href="http://en.wikipedia.org/wiki/Stereochemistry">stereochemical enantiomers</a> is an important aspect of data
exchange of chemical structures. The simplest, most neglected solution is to pass around 3D models, but a lot of people like to
stick to things like <a href="http://www.opensmiles.org/">SMILES</a>, or <a href="http://www.chem.qmul.ac.uk/iupac/">IUPAC names</a>. Now, given that
we want to uniquely represent the stereochemistry, we can use special rules. One example for enantiomers are the
<a href="http://en.wikipedia.org/wiki/Cahn%E2%80%93Ingold%E2%80%93Prelog_priority_rules">Cahn-Ingold-Prelog (CIP) rules</a>.</p>

<p>The <a href="http://cdk.sf.net/">CDK</a> does not have an implementation of (part of) the CIP rules. However, we recently started a
collaboration with Dr <a href="http://se.linkedin.com/pub/lars-carlsson/9/641/203">Lars Carlsson</a> in the Computational Toxicology,
Global Safety Assessment group at <a href="http://www.astrazeneca.se/om_oss/verksamheten-i-Sverige/Forskning/Molndal/?itemId=3118278">AstraZeneca R&amp;D Mölndal</a>,
headed by Dr Scott Boyer. Within this collaboration I have started an partial implementation of the CIP rules. The full set of
rules is quite extensive, and some subrules are outside the scope of the collaboration. For example, we will likely not look at
axial or helical stereochemistry within this collaboration. The kind of things it is able to do is distinguish between these
mirror images (yeah, I should use <a href="http://www.jmol.org/">Jmol</a>, but <a href="https://qlever.scholia.wiki/">Scholia</a> needs more plugging
right now: click the images):</p>

<p><a href="https://qlever.scholia.wiki/chemical/Q105313057" imageanchor="1" style="margin-left: 1em; margin-right: 1em;">
  <img border="0" src="/blog/assets/images/(R)-bromo(chloro)iodomethane.png" />
</a>
<a href="https://qlever.scholia.wiki/chemical/Q140196930" imageanchor="1" style="margin-left: 1em; margin-right: 1em;">
  <img border="0" src="/blog/assets/images/(S)-bromo(chloro)iodomethane.png" />
</a></p>

<p>The current patch is not looking into the problem of which atom is chiral; that problem is quite complex in itself, and Tim
is writing up a nice set of <a href="http://timvdm.blogspot.com/2010/03/detecting-stereogenic-units-alternative.html">blogs</a>
<a href="http://timvdm.blogspot.com/2009/09/more-para-stereocenters-permutation.html">about</a> <a href="http://timvdm.blogspot.com/2009/09/as-promised-here-are-some-molecules.html">that</a>.
Further, the current aims focuses only at application to atoms of ligancy four; that is, carbons.</p>

<p>The CIP rules uniquely define the stereochemistry of such a carbon, by uniquely ordering the ligands around the atom. Using
rules the ligands are ordered, and they include rules defining priority based on atomic number, mass number, etc. It is the
recursion that makes things more interesting, but I will not delve into the details of the algorithm here (see the aforelinked
Wikipedia page instead, or a cheminformatics book like the one shown on the right). Here, I want to introduce some of the API
of the current patch for the CDK.</p>

<h2 id="ligands-and-their-priorities">Ligands and their Priorities</h2>

<p>Core to the implementation are the CIP priority rules, that allow ordering of the ligand. So, we define a molecule, and ligands:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">IMolecule</span> <span class="n">molecule</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="na">parseSmiles</span><span class="o">(</span><span class="s">"IC(Br)(Cl)[H]"</span><span class="o">);</span>
<span class="nc">ILigand</span> <span class="n">ligand1</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Ligand</span><span class="o">(</span>
  <span class="n">molecule</span><span class="o">.</span><span class="na">getAtom</span><span class="o">(</span><span class="mi">1</span><span class="o">),</span> <span class="n">molecule</span><span class="o">.</span><span class="na">getAtom</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span>
<span class="o">);</span>
<span class="nc">ILigand</span> <span class="n">ligand2</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Ligand</span><span class="o">(</span>
  <span class="n">molecule</span><span class="o">,</span> <span class="n">molecule</span><span class="o">.</span><span class="na">getAtom</span><span class="o">(</span><span class="mi">1</span><span class="o">),</span> <span class="n">molecule</span><span class="o">.</span><span class="na">getAtom</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
<span class="o">);</span>
<span class="nc">ISequenceSubRule</span> <span class="n">rule</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">CIPLigandRule</span><span class="o">();</span>
<span class="nc">Assert</span><span class="o">.</span><span class="na">assertEquals</span><span class="o">(-</span><span class="mi">1</span><span class="o">,</span> <span class="n">rule</span><span class="o">.</span><span class="na">compare</span><span class="o">(</span><span class="n">ligand1</span><span class="o">,</span> <span class="n">ligand2</span><span class="o">));</span>
<span class="nc">Assert</span><span class="o">.</span><span class="na">assertEquals</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="n">rule</span><span class="o">.</span><span class="na">compare</span><span class="o">(</span><span class="n">ligand2</span><span class="o">,</span> <span class="n">ligand1</span><span class="o">));</span>
</code></pre></div></div>

<p>This <a href="http://www.junit.org/">JUnit</a> test looks at the chiral compound given earlier, but without specifying the stereochemistry
using the @@/@ SMILES syntax; we get to that later. Here, the example defines two ligands around atom 1 (which is the carbon;
the index starts at 0). The first ligand is the bromine, the second ligand is the iodine. Because the latter takes priority
according to the CIP rules, the compare(ligand1, ligand2) returns -1.</p>

<h2 id="the-ciptool">The CIPTool</h2>

<p>This <em>CIPLigandRule</em> is used in the CIPTool to provide more user-oriented methods. The goal, obviously, is this bit of code:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">IMolecule</span> <span class="n">molecule</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="na">parseSmiles</span><span class="o">(</span><span class="s">"ClC(Br)(I)[H]"</span><span class="o">);</span>
<span class="nc">LigancyFourChirality</span> <span class="n">chirality</span> <span class="o">=</span>
  <span class="nc">CIPTool</span><span class="o">.</span><span class="na">defineLigancyFourChirality</span><span class="o">(</span>
    <span class="n">molecule</span><span class="o">,</span> <span class="mi">1</span><span class="o">,</span> <span class="mi">4</span><span class="o">,</span> <span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="no">STEREO</span><span class="o">.</span><span class="na">CLOCK_WISE</span>
  <span class="o">);</span>
<span class="nc">Assert</span><span class="o">.</span><span class="na">assertEquals</span><span class="o">(</span>
  <span class="no">CIP_CHIRALITY</span><span class="o">.</span><span class="na">R</span><span class="o">,</span>
  <span class="nc">CIPTool</span><span class="o">.</span><span class="na">getCIPChirality</span><span class="o">(</span><span class="n">chirality</span><span class="o">)</span>
<span class="o">);</span>
</code></pre></div></div>

<p>Because we do not have 3D coordinates in our SMILES, we define the stereochemistry as CLOCK_WISE and ANTI_CLOCK_WISE.
The former here means that, looking from the first ligand, following atoms 2, 3, and 4 are oriented in a circle in a
clock-wise turn. This defines uniquely the geometrical orientation, but which changes between CLOCK_WISE and ANTI_CLOCK_WISE
upon every atom-atom exchange. Therefore, we uniquely prioritize the ligands, project, and translate the resulting
CLOCK_WISE or ANTI_CLOCK_WISE in the appropriate R and S stereochemistry.</p>

<p>That’s all for now. Questions, ideas and others most welcome in the comment!</p>]]></content><author><name>Egon Willighagen</name></author><category term="cdk" /><category term="chemistry" /><category term="cheminfo" /><category term="scholia" /><summary type="html"><![CDATA[Uniquely identifying stereochemical enantiomers is an important aspect of data exchange of chemical structures. The simplest, most neglected solution is to pass around 3D models, but a lot of people like to stick to things like SMILES, or IUPAC names. Now, given that we want to uniquely represent the stereochemistry, we can use special rules. One example for enantiomers are the Cahn-Ingold-Prelog (CIP) rules.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://egonw.github.io/blog/blog/assets/images/(R)-bromo(chloro)iodomethane.png" /><media:content medium="image" url="http://egonw.github.io/blog/blog/assets/images/(R)-bromo(chloro)iodomethane.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>